2019-11-25

–>

Agenda

  1. Me
  2. Lessons Learned
  3. Containers
  4. Kubernetes
  5. Containers for Data Scientists
  6. Q/A
  7. Advertisement

Troy Hernandez, PhD

Education - UIC

Bachelors - Philosophy/Mathematics

Masters - Game Theory/OR

PhD - Statistics/Machine Learning

Postdoc - Tsinghua University, Beijing

Professional

Civis Analytics - Data Scientist

Vivaki - Data Scientist

Sears Holdings - Data Scientist

IBM - Open Source Analytics Technical Evangelist

IBM - Executive Architect

Personal

Cubs MtG Music Politics

Lessons Learned

Containers

Docker

Open Container Initiative

Containers are Lightweight VMs!

Containers are Lightweight VMs!

Virtual machines are software computers that provide the same functionality as physical computers. Like physical computers, they run applications and an operating system. However, virtual machines are computer files that run on a physical computer and behave like a physical computer. In other words, virtual machines behave as separate computer systems.”

https://www.vmware.com/topics/glossary/content/virtual-machine

Image -> Container

For containers, those computer files are called images.

Containers are instantiations of an image.

Containers are not VMs

k8s

“Kubernetes is an open-source container orchestration system for automating application deployment, scaling, and management”

Kubernetes

Containers for Data Scientists

Use Case #1 - Reproducible Research

Without Containers

Have:

  1. Code
  2. Data

Don’t Have:

  1. Precise Version
  2. Corresponding packages

Use Case #1 - Reproducible Research

Without Containers

Use Case #1 - Reproducible Research

Without Containers

> library("ggplot2")
Warning message:
package ‘ggplot2’ was built under R version 3.4.4

Use Case #1 - Reproducible Research

With Containers

Everything is packaged up into an immutable image.

As long you can run the container, you can reproduce the research!

Use Case #2 - Resources

Use Case #2 - Resources

apiVersion: v1
kind: Pod
metadata:
  name: frontend
spec:
  containers:
  - name: db
    image: mysql
    env:
    - name: MYSQL_ROOT_PASSWORD
      value: "password"
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
  - name: wp
    image: wordpress
    resources:
      requests:
        memory: "64G"
        cpu: "16"
      limits:
        memory: "128G"
        cpu: "16"

Use Case #2 - Resources

Alternative:

“Terraform is an open-source infrastructure as code software tool created by HashiCorp. It enables users to define and provision a datacenter infrastructure using a high-level configuration language known as Hashicorp Configuration Language, or optionally JSON.”

Use Case #3 - Microservices

Use Case #3 - Microservices

Use Case #3 - Microservices

R code as an API

10 Tricks to Appear Smart in Meetings

Thank you

Questions?

Advertisement

Advertisement

Advertisement